This notebook was developed to accompany the tutorial of a short course offered at the 2017 Annual Meeting of the American Political Science Association. The instructors for the course are Karsten Donnay (University of Konstanz), Eric Dunford (University of Maryland), Andrew Linke (University of Utah), Erin McGrath (University of Maryland), David Backer (University of Maryland), and David Cunningham (University of Maryland). This short course focuses on newly developed software tools designed by the instructors, which enable more effective work with multiple datasets that have geospatial properties, which are increasingly employed in research conducted throughout the social sciences. The aims of the course are to familiarize participants with the use of these tools and associated best practices. At the end of the course, participants should understand why and how they could use these tools to support relevant research that requires integrating datasets with particular geospatial properties.
The first part of the notebook walks through the functionality, applications and best practices of the geomerge package, which was just released. This package has been designed primarily to facilitate addressing challenges related to the integration of datasets with different geospatial properties. The package is illustrated using example data for Nigeria 2011. The illustration covers integration of Polygon, Raster and Point data, including how to generate spatial panel data.
The first part of the notebook walks through the functionality, applications and best practices of the meltt package, which was released earlier this year. This package has been designed to facilitate the integration of event data from multiple sources with differing properties. The package is illustrated by drawing on conflict event data from four prominent event datasets covering conflict observed in Nigeria during 2011.
The tutorial is designed to be hands-on, with participants working through the illustrative examples, accessing and processing datasets using the commands available in the geomerge and meltt packages. Doing so requires, at a minimum, an installation of the R programming software. Some knowledge of R is useful, though not mandatory. During this short course and tutorial, participants should learn about the utility, logic, and functionality of the two packages even without any significant expertise in R.
The use case: In practice, research involving spatial data typically entails drawing on multiple sources that provide information on distinct variables, each with a particular geographical resolution. Conducting analysis requires integrating these variables from the separate datasets into a common data frame, with a geographical resolution that is appropriately comparable across all the variables.
The main challenges: Separate datasets can have very different spatial data formats. For example, information on population or elevation is most often available as Raster data. Information on a country’s administrative subdivisions is typically provided as Polygon data. The locations of conflict events or incidents of crime are usually coded as Point data. In essence, these data formats correspond to different units of observation. Different units implies a spatial mismatch. When spatial data are mismatched, they may not be usable for particular types of analysis (unless purposely considering variables at different units of observation). Separate datasets may also treat the same variable as being of different types (e.g., numeric vs. categorical).
The technical challenges: A whole range of packages in R provide excellent functionality for dealing with these data integration problems, without a single, simple framework that combines all this functionality. In addition, integrating different kinds of spatial data requires making assumptions and providing specifications for how to proceed with the integration.
The geomerge package provides this framework. The package allows for the automatic, flexible, transparent, reproducible integration of the most common types of spatial data. The integration can produce variables with the same spatial resolution, or merely establish the spatial correspondence of variables with different resolutions. In doing so, the package implements a number of established best practices that ensure robust results for many standard cases, while allowing for customization through optional parameters.
geomerge supports empirical research using spatial data in several important ways. First, the package streamlines the process of integrating data from multiple sources. Second, the package offers the flexibility of enabling users to generate variants of the same data. Each of these variants can reflect different assumptions about how to perform the integration, including in reference to the choice of spatial unit, as well as the choice of assignment, zonal function or point aggregation rules. Third, the variants can be used to test the robustness of analyses to assumptions about data integration. Fourth, the package contributes to transparency and simplifies replication by providing clear, standardized interfaces that document the assumptions users made when integrating data. The data and code used in performing any integration can be supplied to accompany similar code used in performing analysis.
The package can be installed through the CRAN repository.
#install.packages("geomerge")
We recommend to install the latest development version of the package from Github for the purposes of this tutorial. To download this version, you may first need to install the devtools package.
# install.packages("devtools")
devtools::install_github("css-konstanz/geomerge")
library(geomerge)
Before we get started, please set your work directory to the directory into which you unpacked the tutorial files (including the “data” directory).
setwd("YOUR DIRECTORY")
In this tutorial, we use a number of different data layers for Nigeria 2011 that constitute the example data distributed with the geomerge package. The data can be easily loaded using
data(geomerge)
The example datasets cover all three main spatial data types discussed above:
ACLED (Point data): Conflict events for Nigeria in 2011 as recorded by the Armed Conflict Location & Event Data project, available from http://www.acleddata.com/data. This dataset contains geocoded, timestamped information on individual conflict events.
AidData (Point data, including locations geocoded to administrative divisions, but assigned coordinates of centroids): Activities of development aid projects in Nigeria with start dates in 2011 as recorded by AidData, available at http://aiddata.org. This dataset contains geocoded, timestamped information on individual aid projects.
Note: Both Point datasets are time-stamped, which means that they can be used for dynamic (i.e., spanning a spatial panel) as well as static (i.e., cross-sectional) integration.
geoEPR (Polygons data): All politically relevant ethnic groups for Nigeria in 2011, as recorded in the EPR-Core 2014 dataset, available at https://icr.ethz.ch/data/epr/geoepr/. This dataset assigns every politically relevant ethnic group one of six settlement patterns and provides polygons describing their location.
gpw (Raster data): Population at a gridded resolution of about 4km for Nigeria in 2010, as compiled by CIESIN, available at http://sedac.ciesin.columbia.edu/data/collection/gpw-v4. This dataset provides population estimates at several grid resolutions.
states (Polygons data): Second-order administrative divisions (ADM2s) for Nigeria, known as Local Government Areas (equivalent of US states). The dataset is available at http://www.arcgis.com/home/item.html?id=0e58995046b74254911c1dc0eb756fa4. This dataset is used in the illustration for the target SpatialPolygonsDataFrame to which spatial data are merged. The polygons in states have been simplified to reduce the size of the SpatialPolygonsDataFrame and enable fast execution of the examples provided.
To familiarize yourself with these datasets, we recommend to take a closer look at them. To see a handful of sample values for each:
library(raster)
## Loading required package: sp
## Warning: package 'sp' was built under R version 3.4.3
# Quick overview plot
plot(states)
# Show top rows of dataset
head(states@data)
## ID NAME_0 NAME_1
## 0 1 Nigeria Abia
## 1 2 Nigeria Adamawa
## 2 3 Nigeria Akwa Ibom
## 3 4 Nigeria Anambra
## 4 5 Nigeria Bauchi
## 5 6 Nigeria Bayelsa
# Quick overview plot
plot(states)
plot(ACLED, new=TRUE,add=TRUE)
# Show top rows of dataset
head(ACLED@data)
## GWNO EVENT_ID_C EVENT_ID_N timestamp YEAR TIME_PRECI
## 1 475 2962NIG 67219 2011-01-01 2011 1
## 2 475 2963NIG 67220 2011-01-03 2011 1
## 3 475 2964NIG 67221 2011-01-03 2011 1
## 4 475 2965NIG 67222 2011-01-04 2011 1
## 5 475 2966NIG 67223 2011-01-04 2011 1
## 6 475 2967NIG 67224 2011-01-04 2011 1
## EVENT_TYPE
## 1 Strategic development
## 2 Violence against civilians
## 3 Violence against civilians
## 4 Strategic development
## 5 Riots/Protests
## 6 Battle-No change of territory
## ACTOR1
## 1 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad
## 2 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad
## 3 Unidentified Armed Group (Nigeria)
## 4 DDM: Delta Democratic Militia
## 5 Rioters (Nigeria)
## 6 PDP: Peoples Democratic Party
## ALLY_ACTOR INTER1
## 1 <NA> 3
## 2 <NA> 3
## 3 <NA> 3
## 4 <NA> 3
## 5 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad 5
## 6 <NA> 3
## ACTOR2 ALLY_ACT_1
## 1 <NA> <NA>
## 2 Civilians (Nigeria) Police Forces of Nigeria (1999-2015)
## 3 Civilians (Nigeria) <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 RPN: Republican Party of Nigeria <NA>
## INTER2 INTERACTIO COUNTRY ADMIN1 ADMIN2 ADMIN3 LOCATION
## 1 0 30 Nigeria Plateau Jos North <NA> Jos
## 2 7 37 Nigeria Borno Maiduguri <NA> Maiduguri
## 3 7 37 Nigeria Borno Maiduguri <NA> Maiduguri
## 4 0 30 Nigeria Delta Ughelli North <NA> Ughelli
## 5 0 50 Nigeria Adamawa Gombi <NA> Jimeta
## 6 3 33 Nigeria Oyo Ogbomosho North <NA> Orogun
## LATITUDE LONGITUDE GEO_PRECIS SOURCE
## 1 9.92849 8.89212 1 Agence France Presse
## 2 11.84644 13.16027 1 Agence France Presse
## 3 11.84644 13.16027 1 Agence France Presse
## 4 5.48986 6.00743 1 BBC Monitoring Africa
## 5 9.28333 12.46667 1 BBC Monitoring Africa
## 6 8.15000 4.28330 1 ChannelsTV
## NOTES
## 1 Suspected Boko Haram arsonists burnt a church in a northern Nigerian city. Arsonists Saturday night who set a fire on the church that gutted a section of it before the fire was put out by residents. No one was hurt in the attack as there were no worshipp
## 2 Suspected members of a radical Islamist sect blamed for a spate of recent attacks in northern Nigeria shot dead an off-duty policeman in Maiduguri. The victim was wearing civilian clothes and was about to enter his home when the attack took place.
## 3 Gunmen killed three people at a movie theatre in a northern city in an attack police believe is politically-motivated ahead of general elections. The assailants were believed to be thugs loyal to a local politician.
## 4 A little-known group calling itself the Delta Democratic Militia claimed responsibility for an arson attack which razed the INEC offices in Delta state to the ground. The group claimed the attack was a warning against electoral malpractice in the upcomin
## 5 A riot broke out at Jimeta Prison complex when suspected Boko Haram inmates attempted a prison break by overpowering guards. The attempted break-out was unsuccessful.
## 6 Three people were killed in Orogun after a clash between supporters of two governorship candidates of the RPN and PDP.
## FATALITIES
## 1 0
## 2 1
## 3 3
## 4 0
## 5 0
## 6 3
# Quick overview plot
plot(states)
plot(AidData, new=TRUE,add=TRUE)
# Show top rows of dataset
head(AidData@data)
## project_id geoname_id precision_ place_name latitude longitude
## 120 104763105 2332453 4 Lagos 6.53774 3.35220
## 126 104895556 2328926 6 Nigeria 10.00000 8.00000
## 141 104924416 2352778 1 Abuja 9.05785 7.49508
## 142 104924722 2332453 4 Lagos 6.53774 3.35220
## 143 104924761 2328926 6 Nigeria 10.00000 8.00000
## 144 104924802 2328927 2 Niger Delta 4.83333 6.00000
## location_t geoname_ad
## 120 ADM1 6295630|6255146|NG|05
## 126 PCLI 6295630|6255146|NG
## 141 PPLC 6295630|6255146|NG|11|8635054|2352778
## 142 ADM1 6295630|6255146|NG|05
## 143 PCLI 6295630|6255146|NG
## 144 DLTA 6295630|6255146|NG|00
## geoname__1
## 120 Earth|Africa|Nigeria|Lagos
## 126 Earth|Africa|Nigeria
## 141 Earth|Africa|Nigeria|Federal Capital Territory|Municipal Area Council|Abuja
## 142 Earth|Africa|Nigeria|Lagos
## 143 Earth|Africa|Nigeria
## 144 Earth|Africa|Nigeria|Niger Delta
## aiddata_id aiddata_2_ year donor donor_iso donor_regi
## 120 104763105 <NA> 2011 Norway NO Europe
## 126 104895556 <NA> 2011 Australia AU Oceania
## 141 104924416 <NA> 2011 Norway NO Europe
## 142 104924722 <NA> 2011 Norway NO Europe
## 143 104924761 <NA> 2011 Norway NO Europe
## 144 104924802 <NA> 2011 Norway NO Europe
## implementi financing_ crs_bi_mul recipient recipient_
## 120 Carbon Limits AS NORAD 1 Nigeria NG
## 126 Public Sector AusAID 1 Nigeria NG
## 141 Jose Manuel Ramos MFA 1 Nigeria NG
## 142 INCAS Consulting MFA 1 Nigeria NG
## 143 INCAS Consulting MFA 1 Nigeria NG
## 144 INCAS Consulting MFA 1 Nigeria NG
## recipient1 timestamp end_date commitment planned_st
## 120 Africa, South of Sahara 2011-10-10 31/12/2012 2011/01/01 <NA>
## 126 Africa, South of Sahara 2011-07-01 30/6/2018 2011/01/01 <NA>
## 141 Africa, South of Sahara 2011-09-29 31/12/2012 2011/01/01 <NA>
## 142 Africa, South of Sahara 2011-10-31 31/12/2012 2011/01/01 <NA>
## 143 Africa, South of Sahara 2011-04-07 31/12/2011 2011/01/01 <NA>
## 144 Africa, South of Sahara 2011-03-31 31/12/2011 2011/01/01 <NA>
## planned_en
## 120 <NA>
## 126 <NA>
## 141 <NA>
## 142 <NA>
## 143 <NA>
## 144 <NA>
## title
## 120 Lagos State Gov - CDM development - sawdust utilization
## 126 ADS Intake 2012 - Consolidated
## 141 JPO Ingrid Midtgaard UNODC
## 142 Cultural week 2012
## 143 Fridtjov Nansen Nigeria-Sao Tome JDZ 2011
## 144 Integrated Stabilisation Framework, Niger Delta. Konflikthindtering
## short_desc
## 120 LAGOS STATE GOV - CDM DEVELOPMENT - SAWDUST UTILIZATION
## 126 ADS INTAKE 2012 - CONSOLIDATED
## 141 JPO INGRID MIDTGAARD UNODC
## 142 CULTURAL WEEK 2012
## 143 FRIDTJOV NANSEN NIGERIA-SAO TOME JDZ 2011
## 144 INTEGRATED STABILISATION FRAMEWORK, NIGER DELTA. KONFLIKTHINDTERING
## long_descr
## 120 CDM project development - The project focuses on utilization of the biomass waste in the sawmill community of Okobaba for a biomass fuel, thereby reducing CO2-emissions.
## 126 In-Australia costs for all Australian Development Awards long and short term courses
## 141 JPO Ingrid Midtgaard UNODC. Duty station: Abuja, Nigeria. Sector: Criminal Justice.
## 142 Cultural and business week in Lagos 22 -25 February 2012
## 143 Nigeria-Sao Tome & Principe Joint Development Authority (JDA) has asked the Norwegian Government to use the research vessel Dr. Fridtjov Nansen to investigate the marine resources, oceanography and environmental monitoring in connection with their newly
## 144 Integrated Stabilisation Framework for the Niger Delta Expert Working Group Rapport Nigeria. Konflikthindtering
## donor_proj donor_seco aiddata_se
## 120 NGA-12/0001 2011001606 230
## 126 11A758 2011000917 160
## 141 NGA-11/0005 2011001604 151
## 142 NGA-12/0002 2011001607 160
## 143 NGA-10/0012 2011001601 313
## 144 NGA-11/0002 2011001602 152
## aiddata__1 aiddata_pu
## 120 Energy generation and supply 23030
## 126 Other social infrastructure and services 16010
## 141 Government and civil society, general 0
## 142 Other social infrastructure and services 0
## 143 Fishing 0
## 144 Conflict prevention and resolution, peace and security 0
## aiddata__2 aiddata_ac
## 120 Power generation/renewable sources 23030.07
## 126 Social/ welfare services 16010.07|91010.01
## 141 <NA> <NA>
## 142 <NA> <NA>
## 143 <NA> <NA>
## 144 <NA> <NA>
## aiddata__3
## 120 Biomass
## 126 Culture and recreation|All items relating to otherwise unspecified adminstrative costs of donors
## 141 <NA>
## 142 <NA>
## 143 <NA>
## 144 <NA>
## flow_name crs_sector crs_sect_1
## 120 ODA Grants 230 II.3. Energy
## 126 ODA Grants 430 IV.2. Other Multisector
## 141 ODA Grants 151 I.5.a. Government & Civil Society-general
## 142 ODA Grants 160 I.6. Other Social Infrastructure & Services
## 143 ODA Grants 313 III.1.c. Fishing
## 144 ODA Grants 152 I.5.b. Conflict, Peace & Security
## crs_purpos crs_purp_1
## 120 23070 Biomass
## 126 43081 Multisector education/training
## 141 15113 Anti-corruption organisations and institutions
## 142 16061 Culture and recreation
## 143 31320 Fishery development
## 144 15220 Civilian peace-building, conflict prevention and resolution
## coalesced_ coalesced1
## 120 23030 Power generation/renewable sources
## 126 16010 Social/ welfare services
## 141 15120 Public sector financial management
## 142 16010 Social/ welfare services
## 143 31320 Fishery development
## 144 15220 Civilian peace-building, conflict prevention and resolution
## commitme_1 total_proj crs_trade crs_climat crs_biodiv crs_gender
## 120 44071 0 0 1 0 0
## 126 29217 0 0 0 0 1
## 141 53527 0 0 0 0 0
## 142 17842 0 0 0 0 0
## 143 713699 0 0 0 0 0
## 144 592371 0 0 0 0 0
## crs_enviro crs_desert pdgg channel_co finance_t associated future_ds_
## 120 0 <NA> 0 52000 C01 <NA> 0
## 126 0 <NA> <NA> 51000 E01 <NA> 0
## 141 0 <NA> 2 41128 D01 <NA> 0
## 142 0 <NA> 0 52000 G01 <NA> 0
## 143 2 <NA> 0 51000 C01 <NA> 0
## 144 0 <NA> 2 52000 C01 <NA> 0
## future_ds1 received_a irtc_amoun untied_amo tied_amoun partial_ti
## 120 0 0 0 44071 0 0
## 126 0 0 0 29217 0 0
## 141 0 0 0 53527 0 0
## 142 0 0 0 0 0 0
## 143 0 0 0 713699 0 0
## 144 0 0 0 592371 0 0
## finance_t2 arrears_in arrears_pr initial_re ftc repay_type outstandin
## 120 C01 0 0 1 <NA> <NA> 0
## 126 E01 0 0 8 1 <NA> 0
## 141 D01 0 0 1 1 <NA> 0
## 142 G01 0 0 1 <NA> <NA> 0
## 143 C01 0 0 1 <NA> <NA> 0
## 144 C01 0 0 1 <NA> <NA> 0
## interest_a expert_com export_cre expert_ext additional source
## 120 0 0 0 0 <NA> OECD
## 126 0 0 0 0 <NA> OECD
## 141 0 0 0 0 <NA> OECD
## 142 0 0 0 0 <NA> OECD
## 143 0 0 0 0 <NA> OECD
## 144 0 0 0 0 <NA> OECD
## source_det
## 120 CRS Online 2012
## 126 CRS Online 2012
## 141 CRS Online 2012
## 142 CRS Online 2012
## 143 CRS Online 2012
## 144 CRS Online 2012
# Quick overview plot
plot(geoEPR)
# Show top rows of dataset
head(geoEPR@data)
## EPRgroup
## 0 Hausa-Fulani and Muslim Middle Belt
## 1 Yoruba
## 2 Igbo
## 3 Tiv
## 4 Ijaw
## 5 Ogoni
The main functionality of the geomerge package is provided by a single function with the same name. The output of the function is an object of class “geomerge”, which is a list with three slots: (1) data contains the spatial data resulting from integration, (2) inputData stores the input dataset, and (3) parameters logs all parameters with which geomerge was executed.
Running geomerge has two basic requirements.
The first requirement is input data, comprised of any number of objects of type SpatialPolygonsDataFrame, SpatialPointsDataFrame and RasterLayer. The RasterLayer will always by definition be single-valued. Therefore, geomerge requires the user to select one specific variable in each of the SpatialPolygonsDataFrame and SpatialPointsDataFrame objects prior to integration. SpatialPointsDataFrame may also contain a second column named timestamp, which can be used for dynamic integration.
The rationale is that the package uses the name of the input data to label the corresponding variables in the integrated data. This approach establishes a clear, unique link between the input and integrated data. If a user wishes to work with several variables from the same dataset, simply enter these variables as separate arguments (with unique names). We generally advise users to rely on meaningful names when labeling input data.
The second requirement, called target, specifies the spatial structure to which variables from all input objects are merged. The example in the geomerge package requires this target to be of class SpatialPolygonsDataFrame. In practice, the spatial structure can have any shape (e.g., polygons of administrative units, raster cells, etc.).
Note: The package provides a useful helper function called integrateGrid, which generates a grid of user-specified cell size for the spatial extent defined by a spatial R object.
geomerge assumes that all inputs of type SpatialPolygonsDataFrame and RasterLayer are static and contemporary. If the polygons or raster are changing, we advise users to rerun geomerge for each interval in which data are static and contemporary. The package allows for dynamic integration of all inputs that are a SpatialPointsDataFrame. For example, one can automatically generate the counts of events that occur within a specific unit of target within a specific time period.
geomerge has a number of other optional arguments, which we will explore further below. These optional arguments enable specific kinds of integration (i.e., dynamic vs. static) and/or allow the user to change assumptions about zonal functions, assignment rules, etc. from the default values.
Note: The print, summary and plot functions are overloaded for objects of class “geomerge”, meaning that these functions return specific outputs for objects of class “geomerge”.
The simplest case is that of merging static layers. Consider, for example, the case that geo-spatial information about the settlement areas of ethnic groups ought to be merged with the administrative units of a country to determine which group is the dominant faction in each area. In the following examples, we therefore assume that the target of integration is the states SpatialPolygonsDataFrame.
Let’s begin by integrating one Polygon dataset with states.
output = geomerge(geoEPR,target=states)
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(geoEPR, target = states)
##
## NOTE: The extent of input geoEPR is smaller than that of target. This might lead to NA values.
##
## Running geomerge in static mode.
## Dataset1: geoEPR (SpatialPolygonsDataFrame)
## Merging polygon data...
## NOTE: No spatial lags calculated for geoEPR since data is non-numeric.
## Done.
## Dataset geoEPR successfully merged to target.
## Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 non numerical variable(s) are available:
## geoEPR
names(output$data)
## [1] "FID" "ID" "NAME_0" "NAME_1" "area" "geoEPR"
Notice that the function returns a number of messages documenting the progress of the integration task. When merging more complex data, the function may run for some time and monitoring progress can therefore be relevant. If no printed progress updates are required, simply use the optional argument silent = TRUE.
output = geomerge(geoEPR,target=states,silent=TRUE)
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 non numerical variable(s) are available:
## geoEPR
Here, the default settings of geomerge make implicit assumptions regarding the assignment of the values in geoEPR to the target of states SpatialPolygonsDataFrame. The default assignment rule uses maximum area overlap (assignment = "max(area)"). This rule implies that a value is assigned to any spatial unit of target that corresponds to the unit in geoEPR with the largest spatial overlap.
As an alternative, geomerge supports assignment based on minimal area overlap (assignment = "min(area)").
Assignment can also be done by maximum population (assignment = "max(pop)") or minimum population (assignment = "min(pop)"), which operate similar to the area .
In addition, geomerge permits assignment weighted by area (assignment = "weighted(area)") or population (assignment = "weighted(pop)"). The former assigns the value that is the area-weighted average across all units intersecting with the spatial unit in target. The latter is analogous, but assigns the value based on the population represented by that area.
Naturally, all the options relying on population require a population raster input called population.data. Here is an example:
output = geomerge(geoEPR,target=states,
silent=TRUE,assignment="max(pop)",
population.data=gpw)
##
## Generating zonal statistics for population based assignment... Done.
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 non numerical variable(s) are available:
## geoEPR
Note: Any weighted assignment (whether area- or population-based) is only allowed for numeric data. Within our illustration, therefore, weighted assignment is not possible for the layer geoEPR.
The integration of Raster data is similarly straightforward.
Note: geomerge accepts any optional arguments of the function extract in the raster package. These arguments can be entered in the exact same syntax as in the original extract function and are passed on to any use of the function within the package. For example, in the illustration we use the optional input na.rm = TRUE because the gpw data has a few missing values that we want to ignore when performing the data integration.
output = geomerge(gpw,na.rm=TRUE,target=states)
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(gpw, na.rm = TRUE, target = states)
##
## Running geomerge in static mode.
## Dataset1: gpw (RasterLayer)
## Generating zonal statistics... Done.
## Dataset gpw successfully merged to target.
## Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 numerical variable(s) are available:
## gpw
plot(output)
As can be seen in the summary, the package not only merged the layer gpw to states, but also generated its value per area of the target polygon and first- and second-order spatial lag values for each. For inputs of type RasterLayer, values per area are always also returned. Whether or not spatial lags should be calculated can be controlled by the optional Boolean argument spat.lag.
output = geomerge(gpw,na.rm=TRUE,target=states,spat.lag=FALSE)
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(gpw, na.rm = TRUE, target = states, spat.lag = FALSE)
##
## Running geomerge in static mode.
## Dataset1: gpw (RasterLayer)
## Generating zonal statistics... Done.
## Dataset gpw successfully merged to target.
## Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 numerical variable(s) are available:
## gpw
plot(output)
As in the case of Polygon data, the defaults of geomerge have built-in implicit assumptions regarding zonal statistics. The default zonal function is summation (zonal.fun = sum). The package also supports all zonal statistics consistent with the extract function in the raster package.
output = geomerge(gpw,na.rm=TRUE,target=states,
spat.lag=FALSE,zonal.fun=min)
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(gpw, na.rm = TRUE, target = states, spat.lag = FALSE,
## zonal.fun = min)
##
## Running geomerge in static mode.
## Dataset1: gpw (RasterLayer)
## Generating zonal statistics... Done.
## Dataset gpw successfully merged to target.
## Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 numerical variable(s) are available:
## gpw
plot(output)
In geomerge, integration of point data supports two different heuristics, which the user specifies via point.agg. The first heuristic (point.agg = "cnt") counts the occurrence of points in a given unit of target. The second heuristic users (point.agg = "sum") sums the values for all points in a given unit. This heuristic is only appropriate for numeric variables.
To illustrate, we use information on the conflict fatalities as recorded in ACLED and the financial commitments of development aid projects as recorded in AidData. We start by looking at the event counts and the number of projects in each Local Government Area of Nigeria throughout 2011 using point.agg = "cnt". Then we examine the total numbers of conflict fatalities and aid dollar commitments associated with those areas.
# First select the corresponding columns only
ACLED.fatalities = ACLED[,names(ACLED)=='FATALITIES']
AidData.commitment = AidData[,names(AidData)=='commitme_1']
# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,target=states,point.agg='cnt')
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, AidData.commitment, target = states,
## point.agg = "cnt")
##
## Running geomerge in static mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Dataset2: AidData.commitment (SpatialPointsDataFrame)
## Aggregating point data... Done.
## Dataset AidData.commitment successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in static mode.
##
## The following 2 numerical variable(s) are available:
## ACLED.fatalities, AidData.commitment
plot(output)
# Run geomerge using point.agg = 'sum
output = geomerge(ACLED.fatalities,AidData.commitment,target=states,point.agg='sum')
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, AidData.commitment, target = states,
## point.agg = "sum")
##
## Running geomerge in static mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Dataset2: AidData.commitment (SpatialPointsDataFrame)
## Aggregating point data... Done.
## Dataset AidData.commitment successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in static mode.
##
## The following 2 numerical variable(s) are available:
## ACLED.fatalities, AidData.commitment
plot(output)
Dynamic integration of point data follows the same process as before, but separated in a series of temporal units, thereby generating a spatial panel. In geomerge, the temporal units are specified through the time argument. The package performs static integration if time = NA. For dynamic integration, the user must specify time = c(start_date, end_date, interval_length). All three inputs must be strings, where interval_length is defined in multiples of t_unit. The default value is t_unit = "days". The package also accepts inputs of “secs”, “mins”, “hours”, “months” or “years”.
In the following illustration, we employ the same data as before, but now include the “timestamp” column from both datasets. Information capturing the timing of observations is a prerequisite for dynamic integration. The information does not have to be at any specific level of precision, but does have to concern timing. We iterate through the whole year 2011 in one-month steps. In other words, we generate a county-month spatial panel.
# First select the corresponding columns only
ACLED.fatalities = ACLED[,names(ACLED)%in%c('timestamp','FATALITIES')]
AidData.commitment = AidData[,names(AidData)%in%c('timestamp','commitme_1')]
# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,
target=states,time=c("2011-01-01","2011-12-31","1"),
t_unit='months',point.agg='cnt')
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, AidData.commitment, target = states,
## time = c("2011-01-01", "2011-12-31", "1"), point.agg = "cnt",
## t_unit = "months")
##
## Running geomerge in dynamic mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Dataset2: AidData.commitment (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset AidData.commitment successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
##
## The following 2 numerical variable(s) are available:
## ACLED.fatalities, AidData.commitment
##
## First and second order temporal lag values available.
plot(output)
## Output data is spatial panel, showing results only for the last period. Use optional argument "period" to select specific time period.
# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,
target=states,time=c("2011-01-01","2011-12-31","1"),
t_unit='months',point.agg='sum')
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, AidData.commitment, target = states,
## time = c("2011-01-01", "2011-12-31", "1"), point.agg = "sum",
## t_unit = "months")
##
## Running geomerge in dynamic mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Dataset2: AidData.commitment (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset AidData.commitment successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
##
## The following 2 numerical variable(s) are available:
## ACLED.fatalities, AidData.commitment
##
## First and second order temporal lag values available.
plot(output)
## Output data is spatial panel, showing results only for the last period. Use optional argument "period" to select specific time period.
Note: By default, plot selects the last time period for purposes of the visualization. If the user wishes to visualize any other period, simply add the optional argument period to the function. Also, first- and second-order time-lagged variables are returned by default. The optional Boolean argument time.lag controls this feature.
output = geomerge(ACLED.fatalities,AidData.commitment,
target=states,time=c("2011-01-01","2011-12-31","1"),
t_unit='months',point.agg='sum',time.lag=FALSE)
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, AidData.commitment, target = states,
## time = c("2011-01-01", "2011-12-31", "1"), time.lag = FALSE,
## point.agg = "sum", t_unit = "months")
##
## Running geomerge in dynamic mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Dataset2: AidData.commitment (SpatialPointsDataFrame)
## Aggregating point data for period 1... Done.
## Aggregating point data for period 2... Done.
## Aggregating point data for period 3... Done.
## Aggregating point data for period 4... Done.
## Aggregating point data for period 5... Done.
## Aggregating point data for period 6... Done.
## Aggregating point data for period 7... Done.
## Aggregating point data for period 8... Done.
## Aggregating point data for period 9... Done.
## Aggregating point data for period 10... Done.
## Aggregating point data for period 11... Done.
## Aggregating point data for period 12... Done.
## Dataset AidData.commitment successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
##
## The following 2 numerical variable(s) are available:
## ACLED.fatalities, AidData.commitment
plot(output, period=3)
## Output data is spatial panel, showing variables only for period 3, as specified.
Thus far, we have only considered integration targets in the form of the Nigeria county polygons states. The generateGrid function in geomerge allows the user to easily generate a matching grid of a chosen resolution. For many econometric applications, this option can be very useful.
# install.packages("sp")
require(sp)
# Generate grid with 10 km cell size (input in m) in local CRS for Nigeria
states.grid <- generateGrid(states,
size= 10000, # meters
local.CRS=CRS("+init=epsg:26391"),
silent = TRUE)
# Run simple static integration with this grid as target
output = geomerge(ACLED.fatalities,target=states.grid,point.agg='sum')
## geomerge: Geospatial data integration.
## Karsten Donnay and Andrew Linke, 2017
##
## ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
##
##
## geomerge(ACLED.fatalities, target = states.grid, point.agg = "sum")
##
## Running geomerge in static mode.
## Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
## Aggregating point data... Done.
## Dataset ACLED.fatalities successfully merged to target.
## Completed!
summary(output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
##
## The following 1 numerical variable(s) are available:
## ACLED.fatalities
plot(output)